Starting as you mean to go on
24-Nov-2025
Which of these files contains the most recent version of the data?
$ ls -l data/
-rw-r--r--. 1 hannah hannah 0 Sep 23 11:38 sample_metadata_clean.tsv
-rw-r--r--. 1 hannah hannah 0 Sep 23 11:37 sample_metadata.tsv
-rw-r--r--. 1 hannah hannah 0 Sep 23 11:38 sample_metadata_USE_THIS_ONE.tsv
-rw-r--r--. 1 hannah hannah 0 Sep 23 11:38 sample_metadataV2_final.tsv
-rw-r--r--. 1 hannah hannah 0 Sep 23 11:38 sample_metadataV2.tsvWhat makes a file name useful? Metadata
We discussed file naming conventions and you were supposed to rename 3 files. How did that go?
Where should you look to find the latest version of protocol you’re interested in testing?
Our lab’s sharepoint is a good example of what not to do…
Which enzyme assay is the one you want?
Choose an organizational style; stick with it
If sharing (with colleagues or future you!), document the organizational style
Divide work into project directories.
Take home: Project directories should be self-contained and hold all files needed to go from raw data to final results
There’s no one best way to organize a project but…
Data is Read-Only! (Your most important goal for setting up a project directory)
Store data-cleaning scripts in a separate folder - create a second read-only ‘clean data’ folder
Generated output is disposable (anything generated by your scripts should be able to be deleted with no concern)*
Save your useful code by wrapping it in share-able functions
What subdirectories do folks use?
What questions should you ask when creating a new subdirectory?
Conferences/ Conference presentations, trave administrative documents
Sean_qsip_tree/ Project file for creating a phylogenetic tree with Sean's qSIP project
Literature/ Relevant literature for ARCSS project (automatically integrated into Zotero/Mendeley libraries)
Senescence/ Project to identify likely senescence times for our sites
mimics_webapp/ Project for Stuart's hairbrained (but genius idea) to turn MIMICS into a webapp
Picarro Code/ Nacent code for processing Picarro outputs
useful_images/ Helpful images related to the project. Often useful in creating figures or presentations
Protocols/ Protocols related to lab work
Writing/ Writing folder; includes derived grants, manuscripts, etc.
qsip/ FICUS qsip project
Assembly-analysis/ Sub-analyses; files contain code, outputs, figures
cazyme_scraper/ Shortcut to a different project file, where I wrote a code pipeline
CN_versatility/ Sub-analyses; files contain code, outputs, figures
Core_microbiome/ Sub-analyses; files contain code, outputs, figures
data/ Raw data; files never edited; common across collaborators; contains both shortcuts to large data sets and actual files
general_climate_weather/ Sub-analyses; files contain code, outputs, figures
GraftM-analysis/ Collaborators's sub-analyses; I don't have to edit anything in here
identifying-outlier-years/ Sub-analyses; files contain code, outputs, figures
identify-temp-WTD-responders/ Sub-analyses; files contain code, outputs, figures
Metabolic-analysis/ Collaborators's sub-analyses; I don't have to edit anything in here
metadata_availability/ Sub-analyses; files contain code, outputs, figures
quantify_stability_with_time_figure/ Sub-analyses; files contain code, outputs, figures
SingleM-analysis/ Sub-analyses; files contain code, outputs, figures
setup.R Common analysis script that takes raw data and does initial cleaning
README.md Readme file; describes how to setup the code and data on your own computer
temporal_paper.yml Contains instructions for installing the software necessary for running all the code in the project
install_dependencies.sh Secondary installation script for software not covered by temporal_paper.yml
R/ Rscripts live here - they include documentation in the form of R-markdown
slurm/ slurm scripts for submitting to supercomputer live here
dada2_ernakovich.yml Installation and software information
README.md/ Tutorial information
.
├── README.md
├── analysis <- all things data analysis
│ └── src <- functions and other source files
├── comm
│ ├── internal_comm <- internal communication such as meeting notes
│ └── journal_comm <- communication with the journal, e.g. peer review
├── data
│ ├── data_clean <- clean version of the data
│ └── data_raw <- raw data (don't touch)
├── dissemination
│ ├── manuscripts
│ ├── posters
│ └── presentations
├── documentation <- documentation, e.g. data management plan
└── misc <- miscellaneous files that don't fit elsewhere
Project folders allow you to take advantage of coding and project management tools
Most IDEs (Integrated Development Environments, e.g. Rstudio) are set up to allow users to work in and switch easily between projects
git version tracking - For tracking your code and files, you set up version tracking in a project folder.
Sharing a project is easy - simply share the project folder with the collaborator
Either: Create an R-project for code you’re working on now
Or: Think about your preferred management style; Reorganize a project directory around that style. (Don’t forget to document it!)
We discussed directory structures. How did your tasks go?
Descriptive metadata: information about the content and context of your data.
Examples: title, creator, subject keywords, and description (abstract)
Structural metadata describe the physical structure of compound data.
Examples: camera used, aperture, exposure, file format, and relation to other data or files
Administrative metadata used to manage your data
Examples: when and how they were created, who can access them, directory structure, software required to use them, and copyright permissions
More Information: MIMARKS guidelines ~ MIMAG guidelines
Either:
We discussed metadata and README files. What was challenging about creating readme files?
Good data habits can be implemented regardless of your experience or computational skill level
Today we’ll go through some check-lists you can use to help cultivate good data and coding habits
You’ve made it through “Good Data Habits!”
Choose Your Next Adventure: